Filtering workflow

ABSTRACT

Embodiments of the present disclosure are directed to a method for processing computer readable electronic files in an investigation in a computer system including a processor coupled to a display and an electronic storage device coupled to the processor. The method includes the processor accessing the electronic files and related data from a data source. The accessed files and related data are culled by the processor based on predetermined filter criteria. The processor stores the remaining files and related data in a third-party data repository and maps a set of electronic files and related data stored in the third-party data repository into a predetermined database schema. The mapped files and related data are analyzed by the processor, which applies a status decision on them. The analyzed electronic files and related data are submitted to a third-party e-discovery processing application based on the applied status decision.

CROSS REFERENCE TO RELATED APPLICATION

This application is based on and derives the benefit of the filing dateof U.S. patent application Ser. No. 14/480,901, filed Sep. 9, 2014,which claims benefit to Provisional Patent Application No. 61/875,474,filed Sep. 9, 2013, the contents of which are incorporated herein byreference in their entirety.

TECHNICAL FIELD

The presently disclosed subject matter generally relates to methods,systems, and apparatuses for data management, and more particularly toan interactive case management system.

BACKGROUND

Many situations call for the analysis of a body of electronically storeddocuments. One example is in electronic discovery. Electronic discovery(or e-discovery) may be referred to as the electronic aspect ofidentifying, collecting, and/or producing electronically storedinformation (ESI) performed in a manner that adheres to the establishedstandards of evidence for the information to become admissible as legalevidence in a court of law. ESI may include, but is not limited to,emails, documents, presentations, databases, voicemails, audio and videofiles, social media, and web sites.

In the context of the discovery phase of litigation, an individual ororganization (target party) may need to gather documents, such asdocuments in its possession, for submission to another party, such as anopposing party, in response to the opposing party's requests forproduction of documents (production requests). The production requestsof the requesting party may cite categories of documents or types ofinformation. Thus, the target party will evaluate its documents, such asdocuments in its possession, for those documents which are relevant tothe cited categories of documents or types of information (relevantdocuments). Once gathered, the target party may further evaluate therelevant documents prior to production to the requesting party forvarious reasons, such as for the purpose of culling or segregatingdocuments that may be subject to the attorney client privilege orattorney work product doctrine (privileged information). Once thedocuments are produced to the opposing party, the opposing party needsto analyze the documents.

SUMMARY

One exemplary embodiment may include a system and a method for assessingtime-based anomalies in data represented by electronic computer readablefiles in a computer system including at least one processor and at leastone electronic storage device coupled to the at least one processor. Themethod may include the at least one processor identifying all electronicfiles stored in the at least one electronic storage device obtained fromone or more specified custodians of electronic files. The method mayalso include the at least one processor determining a date associatedwith each of the identified electronic files. The method may furtherinclude the at least one processor determining a number of electronicfiles associated with the specified custodians in each of a series oftime segments over a period of time. Furthermore, the method may includethe at least one processor causing at least one display coupled to theat least one processor to display the number of electronic files in eachof the series of time segments. The method may also include the at leastone processor causing the at least one display to illustrate those timesegments with large and/or small numbers compared to other timesegments.

Another exemplary embodiment may include a system and a method forprocessing computer readable electronic files in an investigation in acomputer system including at least one processor, at least oneelectronic storage device coupled to the at least one processor and atleast one display coupled to the at least one processor. The method mayinclude the at least one processor accessing the electronic files anddata related to the electronic files from a data source. The method mayalso include the at least one processor culling at least one of theaccessed files and related data based on predetermined filter criteria.The method may further include the at least one processor storing theremaining files and related data in a third-party data repository.Furthermore, the method may include the at least one processor mapping aset of electronic files and related data stored in the third-party datarepository into a predetermined database schema. The method may alsoinclude the at least one processor analyzing the mapped files andrelated data. The method may further include the at least one processorapplying a status decision on the analyzed files and related data. Themethod may additionally include the at least one processor submitting atleast one analyzed electronic file and related data to a third-partye-discovery processing application based on the applied status decision.

Other and further aspects and features of the disclosure will be evidentfrom reading the following detailed description of the embodiments,which are intended to illustrate, not limit, the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated here and constitute apart of this specification, illustrate exemplary embodiments of thepresent disclosure and, together with the description, serve to explainthe principles of the disclosure.

FIG. 1A is a schematic that illustrates a first network environmentincluding an exemplary interactive case management system, according toan embodiment of the present disclosure;

FIG. 1B is a schematic that illustrates a second network environmentincluding the interactive case management system of FIG. 1A, accordingto an embodiment of the present disclosure;

FIG. 1C is a schematic that illustrates a third network environmentincluding the interactive case management system of FIG. 1A, accordingto an embodiment of the present disclosure;

FIG. 1D is a schematic representation of the components of the exemplaryinteractive case management system of FIG. 1A, according to anembodiment of the present disclosure;

FIG. 2 is a schematic that illustrates the exemplary interactive casemanagement system of FIG. 1A in communication with network components,according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating an exemplary method for implementinga data intake module of the interactive case management system of FIG.1A, according to an embodiment of the present disclosure;

FIG. 4A is a flow chart illustrating an exemplary method for performingfiltering using hash values using the interactive case management systemof FIG. 1A, according to an embodiment of the present disclosure;

FIG. 4B is a flow chart illustrating an exemplary method of creating areference hash table using the interactive case management system ofFIG. 1A for implementing the method of FIG. 4A, according to anembodiment of the present disclosure;

FIG. 5 is a schematic that illustrates exemplary reference hash tablesimplemented using the interactive case management system of FIG. 1A,according to an embodiment of the present disclosure;

FIG. 6 is a schematic that illustrates an exemplary method forimplementing a platform connection module of the interactive casemanagement system of FIG. 1A, according to an embodiment of the presentdisclosure;

FIG. 7 is a schematic that illustrates an exemplary search reportgenerated using the interactive case management system of FIG. 1A,according to an embodiment of the present disclosure;

FIGS. 8A, 8B, and 8C illustrate exemplary alias tables generated usingthe interactive case management system of FIG. 1A, according to anembodiment of the present disclosure;

FIG. 9 illustrates an exemplary email communication table generatedusing the interactive case management system of FIG. 1A, according to anembodiment of the present disclosure;

FIGS. 10A, 10B, and 10C illustrate exemplary communication diagramsgenerated using the interactive case management system of FIG. 1A,according to an embodiment of the present disclosure;

FIG. 11 illustrates an exemplary timeline diagram generated using theinteractive case management system of FIG. 1A, according to anembodiment of the present disclosure;

FIG. 12 is a flow chart illustrating an exemplary method forimplementing the interactive case management system of FIG. 1A,according to an embodiment of the present disclosure;

FIG. 13 is a flow chart illustrating an exemplary method for storingmetadata implemented the interactive case management system of FIG. 1A,according to an embodiment of the present disclosure;

FIG. 14 illustrates an exemplary metadata table, an extracted contenttable, and an inventory table generated using the interactive casemanagement system of FIG. 1A, according to an embodiment of the presentdisclosure;

FIG. 15 is a flow chart illustrating an exemplary method for filteringof data implemented by the interactive case management system of FIG.1A, according to an embodiment of the present disclosure; and

FIGS. 16 and 17 illustrate exemplary interface screens for theinteractive case management system of FIG. 1A, according to anembodiment of the present disclosure.

DETAILED DESCRIPTION

The following detailed description is made with reference to thefigures. Exemplary embodiments are described to illustrate thedisclosure, not to limit its scope, which is defined by the claims.Those of ordinary skill in the art will recognize a number of equivalentvariations in the description that follows.

In various embodiments of the present disclosure, definitions of one ormore terms that will be used in the document are provided below.

A “File” is used in the present disclosure in the context of itsbroadest definition. The file may refer to a computer readable,electronic file and related data in a variety of formats supportingstorage, printing, or transfer of the file and related data over acommunication channel. The file may be capable of being editable ornon-editable, encrypted or decrypted, coded or decoded, compressed ordecompressed, and convertible or non-convertible into different fileformats and storage schemas. The file may be capable of being used bysoftware applications to execute predetermined tasks.

A “Document” is used in the present disclosure in the context of itsbroadest definition. The document may refer to an electronic documentincluding a single page or multiple pages. Each page may have text,images, embedded audios, embedded videos, embedded data files, or anycombination thereof. The document may be a type of file.

A “Data Source” is used in the present disclosure in the context of itsbroadest definition. The data source may refer to a networked computingdevice, a computer readable medium, or a portable storage deviceconfigured to at least one of (1) store, manage, or process data orfiles, (2) establish a communication channel or environment, and (3)request services from or deliver services to, or both, other devicesconnected to a network.

A “Custodian” is used in the present disclosure in the context of itsbroadest definition. The custodian may refer to an entity, e.g., ahuman, a storage device, an artificial intelligence (AI) system, etc.,responsible for, or having administrative control over, granting accessto files or data while protecting the data as defined by a securitypolicy or standard information technology (IT) practices in ane-discovery workflow.

A “Case” is used in the present disclosure in the context of itsbroadest definition. The case may refer to a named collection of filesand related data associated with a particular custodian or a group ofcustodians. The case may pertain to a legal matter in the e-discoveryworkflow.

An “Index” is used in the present disclosure in the context of itsbroadest definition. The index may refer to a collection of one or morenamed references to files and related data stored in a database.

A “Search Term” is used in the present disclosure in the context of itsbroadest definition. The search term may refer to one or more strings ofcharacters and/or numbers that may include Boolean logic operators orany other operator corresponding to or compatible with one or morecomputer programming languages. The search term may be a lowest levelindicating minimum information reported for obtaining a search result.

A “User” is used in the present disclosure in the context of itsbroadest definition. The user may refer to an AI system or a personassigned access to and privilege within a computing device or system.

A “Filter Facet” is used in the present disclosure in the context of itsbroadest definition. The filter facet may refer to a category (e.g.,dates, file types, etc.) that may be applied to review only a subset ofthe files and/or related data from a collection case documents.

A “Role” is used in the present disclosure in the context of itsbroadest definition. The role may refer to a grouping of permissionsassigned to the user.

The numerous references in the disclosure to an interactive casemanagement system are intended to cover any and/or all devices capableof performing respective operations on data in the ESI workflow relevantto the applicable context, regardless of whether or not the same arespecifically provided.

EXEMPLARY EMBODIMENTS

FIG. 1A is a schematic that illustrates a first network environmentincluding an exemplary interactive case management system, according toan embodiment of the present disclosure. The first network environment10-1 may include a data source 12 communicating with a third-party ESI,e.g., e-discovery, processing application 14 via an interactive casemanagement system 16 over a network 18. The network 18 may include, forexample, one or more of the Internet, Wide Area Networks (WANs), LocalArea Networks (LANs), analog or digital wired and wireless telephonenetworks (e.g., a PSTN, Integrated Services Digital Network (ISDN), acellular network, and Digital Subscriber Line (xDSL)), radio,television, cable, satellite, and/or any other delivery or tunnelingmechanism for carrying data. Network 18 may include multiple networks orsub-networks, each of which may include, for example, a wired orwireless data pathway. The network 18 may include a circuit-switchedvoice network, a packet-switched data network, or any other network ableto carry electronic communications. For example, the network 18 mayinclude networks based on the Internet protocol (IP) or asynchronoustransfer mode (ATM), and may support voice using, for example, VoIP,Voice-over-ATM, or other comparable protocols used for voice, video, anddata communications.

The data source 12 may be implemented as any of a variety of computingdevices (e.g., a desktop PC, a personal digital assistant (PDA), aserver, a mainframe computer, a mobile computing device (e.g., mobilephones, laptops, etc.), an internet appliance, etc.), or a computerreadable medium such as a smartcard, or a portable storage device (e.g.,a USB drive, an external hard drive, etc.), and so on. The server may beimplemented as any of a variety of computing devices including, forexample, a general purpose computing device, multiple networked servers(arranged in clusters or as a server farm), a mainframe, or so forth.

The third-party ESI processing application 14 (hereinafter referred toas third-party application 14) may include a data repository 20, whichmay include or sub-divided into various databases for storing electronicfiles. The data repository 20 may have one of many database schemasknown in the art, related art, or developed later for storing datacorresponding to the files from the data source 12 via the interactivecase management system 16. For example, the data repository 20 may havea relational database schema involving a primary key attribute and oneor more secondary attributes. The third-party application 14 may performone or more operations such as reading, writing, indexing, updating,etc. on the data, and may communicate with various networked computingdevices.

The interactive case management system 16 may be configured to at leastone of: (1) communicate simultaneously with one or more third-partyapplications such as the third-party application 14, databases such asthe data repository 20, or appliances operating using same or differentcommunication protocols, formats, and database schemas, or anycombination thereof; (2) index, filter, manipulate, and analyze databased on at least one predefined or dynamically created criterion; (3)transfer, receive, or map data for communication with one or morenetworked computing devices and data repositories; (4) associate databased on one or more attributes to create data sets; (5) generatecustomizable visual representations of data or data sets; (6)graphically represent data, data sets, or generated visualrepresentations over a customizable timeline for predetermined one ormore custodians, and/or group of custodians; (7) generate indicationsfor a user and responding to indications from the user regarding thecurrent status or state of files or data; (8) search, identify, extract,map, and use metadata associated with the files; and (9) store files andrelated data including metadata in a non-redundant manner.

The interactive case management system 16 may represent any of a widevariety of devices capable of providing case management services for thenetwork devices. The interactive case management system 16 may beimplemented as a standalone and dedicated “black box” including hardwareand installed software, where the hardware is closely matched to therequirements and/or functionality of the software. Alternatively, theinteractive case management system 16 may be implemented as a softwareapplication or a device driver. The interactive case management system16 may enhance or increase the functionality and/or capacity of thenetwork, such as the network 18, to which it is connected. In someembodiments, the interactive case management system 16 may beconfigured, for example, to perform e-mail tasks, security tasks,network management tasks including IP address management, and othertasks. In some other embodiments, the interactive case management system16 may be configured to expose its computing environment or operatingcode to a user, and may include related art I/O devices, such as akeyboard or display. The interactive case management system 16 of someembodiments may, however, include software, firmware, or other resourcesthat support remote administration and/or maintenance of the interactivecase management system 16.

As illustrated in FIG. 1D, the interactive case management system 16 maybe implemented by way of a single device (e.g., a computing device,processor or an electronic storage device) or a combination of multipledevices. The interactive case management system 16 may be implemented inhardware or a suitable combination of hardware and software. In someembodiments, the interactive case management system 16 may be a hardwaredevice including processor(s) 22 executing machine readable programinstructions for analyzing data, and interactions between the datasource 12 and the data repository 20. The “hardware” may comprise acombination of discrete components, an integrated circuit, anapplication-specific integrated circuit, a field programmable gatearray, a digital signal processor, or other suitable hardware. The“software” may comprise one or more objects, agents, threads, lines ofcode, subroutines, separate software applications, two or more lines ofcode or other suitable software structures operating in one or moresoftware applications or on one or more processors. The processor(s) 22may include, for example, microprocessors, microcomputers,microcontrollers, digital signal processors, central processing units,state machines, logic circuits, and/or any devices that manipulatesignals based on operational instructions. Among other capabilities, theprocessor(s) 22 may be configured to fetch and execute computer readableinstructions in a memory associated with the interactive case managementsystem 16.

The interactive case management system 16 may manage interactionsbetween the data source 12 and the third-party application 14 over thenetwork 18. These interactions may include queries, instructions or datafrom the third-party application 14 to the data source 12 and/or theinteractive case management system 16, and vice versa. The interactivecase management system 16 may include a variety of known, related art,or later developed interface(s) 24, including software interfaces (e.g.,an application programming interface, a graphical user interface, etc.);hardware interfaces (e.g., cable connectors, a keyboard, a card reader,a barcode reader, a biometric scanner, an interactive display screen,etc.); or both.

The interactive case management system 16 may further include anelectronic storage device 26 for storing at least one of (1) a copy offiles and related data including metadata; and (2) a log of profiles ofnetwork devices and associated communications including instructions,queries, data, and related metadata. The storage device 26 may compriseof any computer-readable medium known in the art, related art, ordeveloped later including, for example, volatile memory (e.g., RAM),non-volatile memory (e.g., flash, etc.), disk drive, etc., or anycombination thereof. In one embodiment, the storage device 26 mayinclude a database 28 having a predetermined schema and various modulessuch as a data intake module 30, a platform connection module 32, anemail mapping module 34, and an advanced filtering module 36. Thepredetermined schema and these modules are discussed below in detail.

In some embodiments, the interactive case management system 16 mayinclude, in whole or in part, a software application working alone or inconjunction with one or more hardware resources. Such softwareapplications may be executed by the processor(s) 22 on differenthardware platforms or emulated in a virtual environment. Aspects of theinteractive case management system 16 may leverage known, related art,or later developed off-the-shelf software.

Other embodiments may comprise the interactive case management system 16being integrated or in communication with a mobile switching center,network gateway system, Internet access node, application server, IMScore, service node, or some other communication systems, including anycombination thereof. In some embodiments, the interactive casemanagement system 16 may be integrated with or implemented as a wearabledevice including, but not limited to, a fashion accessory (e.g., a wristband, a ring, etc.), a utility device (a hand-held baton, a pen, anumbrella, a watch, etc.), a body clothing, or any combination thereof.

In further embodiments, the interactive case management system 16 eitherin communication with the data source 12, or independently, may havevideo, voice, and data communication capabilities (e.g., a unifiedcommunication capabilities) by being coupled to or including, variousimaging devices (e.g., cameras, printers, scanners, medical imagingsystems, etc.), various audio devices (e.g., microphones, audio inputdevices, speakers, audio output devices, telephones, speaker telephones,etc.), various video devices (e.g., monitors, projectors, displays,televisions, video output devices, video input devices, cameras, etc.),or any other type of hardware, in any combination thereof. In someembodiments, the interactive case management system 16 may comprise orimplement one or more real time protocols (e.g., session initiationprotocol (SIP), H.261, H.263, H.264, H.323, etc.) and non-real timeprotocols known in the art, related art, or developed later tofacilitate data transfer among the data source 12, the third-partyapplication 14, and any other network device.

In some embodiments, the interactive case management system 16 may beconfigured to convert communications, which may include instructions,queries, data, etc., from the data source 12 into appropriate formats tomake these communications compatible with the third-party dataapplication 14, and vice versa. Consequently, the interactive casemanagement system 16 may allow implementation of the data repository 20using different technologies or by different organizations, e.g., athird-party vendor, managing the data repository 20 using a proprietarytechnology.

In another embodiment (FIG. 1B), the interactive case management system16 may be integrated with, or installed on, the data source 12. In yetanother embodiment (FIG. 1C), the interactive case management system 16may be installed on or integrated with any network appliance 38configured to establish the network 18 between the data source 12 andthe data repository 20. At least one of the interactive case managementsystem 16 and the network appliance 38 may be capable of operating as orproviding an interface to assist exchange of software instructions anddata among the data source 12, the data repository 20, and theinteractive case management system 16. In some embodiments, the networkappliance 38 may be preconfigured or dynamically configured to includethe interactive case management system 16 integrated with other devices.For example, the interactive case management system 16 may be integratedwith the data source 12 (as shown in FIG. 1B), third-party application14 or any other user device (not shown) connected to the network 18. Thedata source 12 may include a module (not shown), which enables that datasource 12 being introduced to the network appliance 38, thereby enablingthe network appliance 38 to invoke the interactive case managementsystem 16 as a service. Examples of the network appliance 38 mayinclude, but are not limited to, a DSL modem, a wireless access point, arouter, a base station, and a gateway having a predetermined computingpower sufficient for implementing the interactive case management system16.

FIG. 2 is a schematic that illustrates the exemplary interactive casemanagement system of FIG. 1A in communication with network components,according to an embodiment of the present disclosure. The interactivecase management system 16 may interact with various network componentsand devices such as the data source 12 and the third-party application14. In one embodiment, the interactive case management system 16 mayinclude the data intake module 30, the platform connection module 32,and the email mapping module 34. The interactive case management system16 may also comprise the advanced filtering module 36 including a filtermodule 40, an analysis and visualization module 42 (AV module 42), and adecision module 44. The AV module 42 may include a file-type analysismodule 46, an email communication analysis module 48 (ECA module 48), adata anomaly analysis module 50, a domain analysis module 52, and aDeDuplication module 54.

Data Intake Module

The data intake module 30 may be configured to interface between thedata source 12 and the third-party application 14. The data intakemodule may implement a predetermined process (FIG. 3) for eliminatingelectronic files and related data that are irrelevant to an e-discoveryrequest or other investigations from being ingested into the datarepository 20.

At step 55, electronic files and related data located in the data source12 are accessed. The data intake module 12 may access a collection ofunprocessed, electronic, computer readable files from the data source12. The data source 12 may present the files as a set of loose files incomputer readable file system to the data intake module 30. Examples ofthese files may include, but are not limited to, system files, programfiles, document files, multimedia files, and emails. The files may beaccessed as being related to a legal case or a custodian.

At step 56, file system information of the accessed files is collected.The data intake module 30 may determine various file system informationassociated with the accessed files using tools and techniques known inthe art, related art, or developed later. Examples of the file systeminformation may include, but are not limited to filename, file path,file type, system date, etc. The determined file system information maybe stored as a record for each of the accessed files in the database 28.The data intake module 30 may create a separate such record for eachcase or custodian. A collection of these records may be arranged in atable referred to as an intake table.

At step 57, a hash value of a predetermined hashing algorithm iscalculated for each of the accessed files and related data. The dataintake module 30 may apply a predetermined hashing algorithm to theaccessed files for calculating a hash value for each of these files. Thehashing algorithm may transform a string of characters in the files intoa shorter fixed-length value or key called a hash value (or hash code)that represents the original character string in the file. The length ofthe hash value may vary based on the applied hashing algorithm. Suchdetermination of the hash values may be employed to label the files,wherein the label may facilitate to determine relevancy of the files forthe e-discovery workflow or investigations.

At step 58, a file type for each of the accessed files is identified.Each of the accessed files may be analyzed by the data intake module 30to identify the file types, which may refer to formats of the files.Examples of such file types may include, but are not limited to, PDF,PST email database, MS WORD processing, MOV video, WAV audio, and TIFFimage. In one embodiment, the accessed files may be analyzed by groupingtogether the files based on the file type (hereinafter referred to asfile type groupings). Each of the file type groupings may be marked witha predefined code, which may identify the ‘type’ of the electronic filesin a particular grouping. Such identification of file types may beperformed using a variety of techniques and tools known in the art,related art, or developed later. For example, ‘File Investigator Tools’developed by Forensic Innovations, Inc. may be implemented by the dataintake module 30 to identify the file types and generate thecorresponding predefined codes. The data intake module 30 may map thegenerated codes for each of the identified file types into the intaketable or a separate file type table, which may be stored in the database28.

At step 60, the identified files may be filtered based on a variety ofpredefined or dynamically defined filter criteria to eliminateelectronic files, which may be considered as irrelevant ornon-responsive for the investigations. In one example, the files may befiltered based on a predefined or dynamically defined file pathreferring to a location (e.g., the data source 12) from which the filewas obtained. When a ‘file path’ criterion is applied, the data intakemodule 30 may provide all the files, which were stored at a particularlocation defined by the file path, as a filter result. In someembodiments, the data intake module 30 may be configured to exclude allthe files that were stored at the location defined by the file path and,in some other embodiments, provide those files as the filter result.

In another example, the files may be filtered based on a date rangeextending between a reference date and a desired date, both dates beinginclusively or exclusively considered for returning a filter result. Thereference date may refer to a date on which the corresponding filesystem was either created or modified on the data source 12. The desireddate may refer to any date after the reference date, for e.g., thelatest date or the date on which such filtering is being performed. Whena ‘date range’ criterion is applied, the data intake module 30 mayprovide all the files, which may be created or modified between thereference date and the desired date, both inclusive, as a filter result.In some embodiments, the data intake module 30 may be configured toexclude all the files that are created or modified between the referencedate and the desired date, both inclusive, and, in some embodiments,provide the remaining files as the filter result.

In yet another example, the files may be filtered based on one or moreselected file types. The code for the selected file types may bedetermined using a file type identification tool, such as that mentionedabove. The determined code may be compared against the codes in the filetype table or the intake table for filtering the files. When a ‘filetype’ criterion is applied, the data intake module 30 may provide allthe files, whose associated file-type codes match the codes in the filetype table or the intake table, as a filter result. In some embodiments,the data intake module 30 may be configured to exclude all the fileswhose associated file-type codes match the codes in one of these tablesand, in some other embodiments, provide the remaining files as thefilter result.

In a further example, the files may be filtered based on hash values ofthe accessed files. The data intake module 30 may compare the calculatedhash value associated with each file with a list of hash values beingirrelevant to ESI investigations and mentioned in one or more referencehash tables as being irrelevant. Those files having hash values matchinghash values in the reference hash tables may be designated asnon-responsive or irrelevant.

The above mentioned exemplary filter criteria may be applied in anyorder by the data intake module 30 upon a user request or selection. Inone embodiment, the data intake module may apply the criteria in apreset order, namely, file path->date range->file type->hash value uponreceiving a request from the user.

At step 74, if the accessed files satisfy all of the selected filtercriteria, the files returned as a positive filter result may be referredto as files possibly relevant for the e-discovery workflow orinvestigations. The data intake module 30 may copy the obtained filterresult including files and related data to a predetermined storagelocation such as the data repository 20. In some embodiments, the dataintake module 30 may also store a copy of the relevant files and relateddata into the database 28. In some other embodiments, once the files andrelated data are stored in the data repository 20, the data intakemodule 30 may generate an intake summary report. This report may show ahigh-level overview of all the files and related data that may be storedin the database 28, and may indicate (1) the files and related datafiltered out by the applied custom predetermined filter criteria, (2)the files and related data that were not copied into at least one of thedata repository 20 and the database 28 due to an intake exception; and(3) the files and related data (in the data source 12) that may not beaccessed or used by the interactive case management system 16.

At step 76, the rest of the files (hereinafter referred to as irrelevantfiles) in the data source 12, which may not be returned as a positivefilter result upon applying the predetermined filter criteria, may beassociated with an indicator such as a label stating “Filtered” by thedata intake module 30. The indicator may identify the irrelevant filesas being already subjected to the filter criteria at least once by thedata intake module 30. These irrelevant files and related data may notbe copied from the data source 12 to the predetermined storage locationsuch as the data repository 20 or the database 28.

At step 78, a record is created for the filter result in the intaketable. The data intake module 30 may create a record in the intake tablefor each of the relevant and irrelevant files, and related data. Therecord may include, without limitation, the filtering information abouteach of the relevant files and the irrelevant files. Examples of thefiltering information may include, but are not limited to, a list ofvalues inputted for various filter criteria such as those mentionedabove, etc. In some embodiments, the data intake module 30 may embedfiles corresponding to the filtering information in the intake table.

FIG. 4B, illustrates an example of the process by which processor 22 orany other processor creates one or more reference hash tables for beingused or referred to by the interactive case management system 16. Forexample, at step 80, a predetermined hashset in the National SoftwareReference Library (NSRL) database is accessed. The hashsets arecollections of files known to be irrelevant to investigations (e.g.,.exe files, .dll files, etc. for known programs). The processor 22 mayaccess at least one of a variety of predetermined hashsets provided bythe National Institute of Standards and Technology (NIST). For example,the processor 22 may use the “minimal” hashset, which includes only oneexample of every file in the NSRL.

At step 82, an MD5 hash value in the accessed hashset is read. Theprocessor 22 may read the “minimal” hashset file-by-file to determinehash values corresponding to the predetermined hashing algorithm, suchas the MD5 hashing algorithm.

The number of hash values in the (NSRL) database is large. If all hashvalues in the NSRL database are provided in a single reference hashtable, the time necessary to compare a hash value to all hash valuesstored in the single reference hash table may be quite long. To shortenthe comparing process, a plurality of reference hash tables may beemployed, each for a different segment of the hash values.

At step 84, values of predetermined digits in the MD5 hash value isdetermined. The read MD5 hash values may be represented in thehexadecimal numbering system. The processor 22 may segregate the MD5hash values into different reference tables based on the predetermineddigits in the hexadecimal MD5 hash values. For example, the read MD5hash values may be separated into 256 reference hash tables based on thefirst two hexadecimal digits. The number of predetermined digits mayvary from one to ‘X’, where ‘X’ is less than the maximum number ofdigits in a particular hash value, such as the MD5 hash value.

The processor 22 may determine the values of the predetermined digits,such as the values of first two hexadecimal digits, which may range from00 to FF in the MD5 hash values. The processor 22 may create a separatereference hash table for each value of the first two hexadecimal digits.In one example, the processor 22 may create 256 reference hash tables102-1, 102-2, . . . , 102-256 (collectively, reference hash tables 102as shown in FIG. 5) based on the first two hexadecimal digits in the MD5hash values.

At step 86, the processor 22 may determine whether a MD5 hash value readfrom the NSRL hashset exists in the appropriate reference hash table asindexed by the predetermined digits. In one embodiment (FIG. 5), eachreference hash table may be named using the same first two hexadecimaldigits. For example as shown, a reference hash table 102-256 may benamed as ‘MD5_FF_HashCode’, where ‘MD5’ may refer to the MD5 hashingalgorithm, ‘FF’ may correspond to the first two hexadecimal digits ofthe MD5 hash values contained in the table 102-256, and ‘HashCode’ mayrefer to the type of content, i.e., hash values, stored in that table102-256.

For each read MD5 hash value, the processor 22 may identify thecorresponding reference hash table based on the values of thepredetermined digits, such as the first two hexadecimal digits. In oneexample, when the value of the first two hexadecimal digits may be “02”,the processor 22 may identify the table 102-3 as the correspondingreference hash table. The data intake module 30 may then check whetherthe read MD5 hash value exists in the identified reference hash table.

At step 88, if the read MD5 hash value is not in the appropriate hashvalue table, the appropriate reference hash table is updated to includethe read MD5 hash value. If the read MD5 hash value is not found in theappropriate reference hash table, such as the table 102-3, the processor22 may update the corresponding reference hash table to include the readMD5 hash value. Processing proceeds to step 90.

At step 90, if the read MD5 hash value is found in the appropriatereference hash table, then the read MD5 hash value may not be added tothe reference hash tables. Therefore, the processor 22 may read the nextMD5 hash value in the accessed hashset, such as the “minimal” hashset,and reiterates the steps 84 to 90. In this manner, the data intakemodule 30 may store every MD5 hash value in the accessed hashset intoone of the 256 tables based on the determined values of thepredetermined digits in the hash values to create a complete set ofreference hash tables.

Once the reference hash tables are created, they can be used to assesswhether files (whose hash values are determined at step 57 in FIG. 3)are irrelevant to the investigation or may potentially be relevant (asdetermined at step 60 in FIG. 3). Exemplary steps for filtering theaccessed files and related data using hash values are shown in FIG. 4A.At step 62, the electronic files and related data located in the datasource 12 are received. The data intake module 30 may receive the filesfor which a hash value is to be determined. The data intake module 30may employ various hashing algorithms known in the art, related art, ordeveloped later including SHA algorithms for determining thecorresponding hash value for each of the files.

At step 64, MD5 hash value is calculated for each of the received filesand related data. In one embodiment, the data intake module may applythe MD5 hashing algorithm to generate a 128-bit (16-byte) MD5 hash valuefor each of the received files. The calculated MD5 hash values may beexpressed in text format as a 32 digit hexadecimal number; however othernumbering systems known in the art, related art, or developed laterincluding binary numbering system, decimal numbering system, or anycombination thereof may be used for representing the hash values.

At step 66, the calculated MD5 hash value is compared with the referencehash table in which all hash values have the same predetermined digitsas the hash value calculated in step 64.

At step 68, the data intake module 30 may check whether the calculatedMD5 hash value for each file in the data source 12 exists in theappropriate reference hash table. At step 70, if the MD5 hash value isfound in the appropriate reference hash table, the corresponding filemay be marked by a variety of indicators known in the art, related art,or developed later including textual indicators (e.g., alphabets,numerals, strings, special characters, etc.), non-textual indicators(e.g., different colors, color luminance, patterns, textures, graphicalobjects, etc.), or any combination thereof. For example, the file may bemarked with a label stating “Filtered by NSRL”, which may indicate thatthe file is not relevant to the e-discovery request or investigations.

However, in one embodiment, if the calculate MD5 hash value is not foundin any of the reference hash tables, at step 72, the corresponding filemay be left unmarked indicating that the data intake module 30 may berelevant to the investigation. Such unmarked files may be re-evaluated,e.g., by a user to ascertain the relevancy of the file for thee-discovery workflow or investigation. The data intake module 30 may beconfigured to generate a log of the irrelevant files and related data.

Since the data intake module 30 may filter out the irrelevant files andrelated data from the data source 12, the data intake module 30 mayprovide significant cost savings for managing and processing therelevant files and related data being ingested into the third-partyapplication 14. Additionally, the data intake module 30 may facilitatecommunication of the current case status to clients or informationrequestors. Further, the data intake module 30 may be automated usingpreset filter criteria and login credentials being dynamicallycommunicated to the data source 12 and the third-party application 14for enhanced reporting, error reduction, and better productivity.

Platform Connection Module

The platform connection module 32 may communicate with the datarepository 20 over the network 18. The data repository 20 may storefiles and related data in a variety of formats and schemas known in theart, related art, or developed later including proprietary file systemsand database schemas. The platform connection module 32 may beimplemented as illustrated in FIG. 6.

At step 110, files and related data stored in the data repository 20 areaccessed. In one embodiment, the platform connection module 32 may beconfigured to login to the third-party application 14 using predefinedor dynamically defined login credentials, e.g., a username and password,to gain access to the files and related data stored in the datarepository 20. In other embodiments, the platform connection module 32may be configured to use a variety of access techniques known in theart, related art, or developed later including predefined or dynamicallyprovided biometric data (e.g., fingerprint, retina scan, etc.), audiodata (e.g., voice), and video data (e.g., face scan, picture scan,etc.). After login, the platform connection module 32 may determine thetype of database schema and the type of file system implemented on thedata repository 20. Additionally, the platform connection module 32 mayreceive database information including, but not limited to, specificdatabase instance and file share location of the data repository 20 froma user. The platform connection module 32 may store this ‘type’ and thedatabase information about the data repository 20 in a configurationrecord in the database 28.

The platform connection module 32 may be configured to interact with thethird-party application 14 using various access protocols ortechnologies known in the art, related art, or developed later includingSQL queries. The platform connection module 32 may use the configurationrecord to interact with the data repository 20. The data repository 20may include a table, hereinafter referred to as DR table, includingmetadata of electronic files stored in the data repository 20 as well asthe electronic files themselves. The platform connection module 32 mayrefer to the DR table (not shown) to determine the location ofextracted, optical character recognition (OCR) converted, or any othertype of data.

At step 112, the platform connection module 32, in one embodiment, maybe configured to map the read files and related data into a predefinedschema of the database 28 such that the files and related data areusable by the interactive case management system 16 or any otherreporting and filtering application or system compatible with theinteractive case management system 16. For this, the platform connectionmodule 32 may parse the accessed data (e.g., body of an email, contentof a word file, a file embedded in another file, etc.) and relatedmetadata for being inserted into one or more tables in the predefinedschema of database 28. In one example of an email file, the platformconnection module 32 may parse the corresponding email address fieldsstored in the database schema of the data repository 20. The emailaddress fields may be parsed into individual email addresses, emaildomains (e.g., text after “@” symbol for SMTP addresses, text after “O”portion of the x500 addresses, etc.), and sender-recipient pairs. In oneembodiment, the platform communication module 32 may insert theindividual email addresses into an alias table, the email domains into adomains table, and each sender-recipient pair into an emailcommunication table. In another embodiment, the platform connectionmodule 32 may insert the parsed data (e.g., individual email addresses,email domains, etc.) and related metadata (e.g., filenames, hash values,size, etc.) in an inventory table. Various modules of the interactivecase management system 16 may use the inventory table to map apredetermined set of files and related data into predefined tables suchas the alias table, the domains table, and the email communicationtable, for analyses, reporting, filtering, or any other operation. Eachof the alias table, the domains table, the email communication table,and the inventory table are discussed below in greater detail.

In another example, the database schema of the data repository 20 mayinclude a table having fields “Author_Name” and “Email_Subject” and thepredefined schema of database 28 may have a table having fields “Author”and “Subject”. The platform connection module 32 may read files and mapthe related data from the “Author_Name” field to the corresponding“Author” field of the predefined schema of database 28. Similarly, theplatform connection module 32 may map the data from the “Email_Subject”field to the corresponding “Email” field of the predefined schema ofdatabase 28.

In some embodiments, the platform connection module 32 may reverse mapinformation associated with the mapped files and related data from thepredefined schema of the database 28 to that of the data repository 20.Such reverse-mapped information may include data added by theinteractive case management system 16. Also, such reverse-mappedinformation may be tagged to facilitate tracking of the tagged data,which is mapped in the predefined schema of database 28. Examples of thereverse-mapped information may include, but are not limited to,custodian ID, media ID, data source ID, work package ID, and so on. Atstep 114, the mapped files and related data in the predefined schema maybe stored in the database 28.

In some other embodiments, the platform connection module 32 may beconfigured to index the mapped files and related data using a variety oftypes of indexes known in the art, related art, or developed later.Examples of the types of indexes may include, but are not limited to,clustered, non-clustered, hash, unique, spatial, and so on. In oneembodiment, the platform connection module 32 may be configured tocreate a full-text index including metadata corresponding to the mappedfiles. The platform connection module 32 may use the full-text index tosupport full-text searching of various data records stored in thedatabase 28.

Search Capability and Search Report

Based on a search term or metadata element inputted by a user, the casemanagement systems 16 may use SQL queries to search for relevant datarecords and the corresponding files using the full-text index. The casemanagement systems 16 may employ various tools, techniques, and syntaxknown in the art, related art, or developed later including “dtSearch”searching technology to implement full-text searching.

Each of the search terms used for searching the data records may bestored as metadata for a corresponding file in the database 28. As such,the interactive case management systems 16 may generate statisticalreports including search results. In one example shown in FIG. 7, asearch report 120 may be displayed based on one or more predeterminedcategories including (1) one or more terms used for searching the datarecords in the database 28, and/or (2) a custodian of the filescorresponding to the searched data records. Under each category, thesearch report 120 may be represented under various columns named as“Total Hits”, which may refer to the total number of instances in whichthe searched term was found; “Docs” may refer to the total number ofdistinct documents or files containing the searched term; “Size” mayrefer to the total size (e.g., in gigabytes) of the “Docs”; “Docsw/Families” may refer to total number of distinct documents after thedocument families are expanded to include all related items such asattachments, etc. of the “Docs”; and “Family Size” may refer to thetotal size (e.g., in gigabytes) of the “Docs w/Families”. Additionally,in one embodiment, the search results may be represented under thecolumns named as “Unique Docs” and “Unique Size”. The column “UniqueDocs” may refer to the total number of documents or files, which aregetting exclusively hit by each of the search terms as compared to otherterms in a search query. For instance as shown, the search term “Time”may result an exclusive search hit on “35,181” documents which none ofthe other terms were found to hit on. In other words, entries under thecolumn “Unique Docs” may represent ‘Search Term Impact’ indicating thatif a particular search term is removed, the corresponding number ofdocuments or files under this column would be dropped from the “TotalHits”. The column “Unique Size” may refer to the total size (e.g., ingigabytes) of the “Unique Docs”.

Email Mapping Module

The email mapping module 34 may be configured to communicate with theplatform connection module 32 or the database 28 to access the parseddata generated by the platform connection module 32. In one embodiment,the email mapping module 34 may be configured to normalize the parseddata, for e.g., stored in the database 28.

In a first example shown in FIG. 8A, the parsed data of three emailrecords, namely “Record 1”, “Record 2”, and “Record 3”, may be mappedinto an alias table. Each of the records may include multiple emailfields, for e.g., “FROM”, “To”, “CC”, and “BCC”, having multipleemail-addresses as data. For instance, as shown, the “Record 1” mayinclude email addresses “anthonyithomas@gabco.ne” in the “FROM” field;“luke.daniels@gabco.ne” and “seth.andrews@gabco.ne” in the “To” field;“roses9009@online.ne” in the “CC” field; and “misaac8795@online.ne” inthe “BCC” field.

Among these records, email addresses such as “luke.daniels@gabco.ne” mayoccur more than once. The email mapping module 34 may be configured tonormalize the parsed data by creating an alias table including a uniqueset of data from the records. For instance, the alias table 130 mayinclude columns, namely, “Alias_ID” and “Email_Address” for storing onlya single instance of the email addresses in the email records. Eachinstance of the unique email address may be saved under the column“Email_Address” and may be given a distinct identity (ID) under thecolumn “Alias_ID”. For example as shown, the email address“luke.daniels@gabco.ne” occurs in all the email records “Record 1”,“Record 2”, and “Record 3”. However, only a single instance of thisemail address may be saved in the alias table 130 under the“Email_Address” column and is given a distinct ID number “2” under the“Alias_ID” column. Single instances of other email addresses may bestored in the alias table 130 in a similar manner. In anotherembodiment, the email mapping module 34 may store a unique set of emaildomains such as gabco.ne and online.ne in a domains table (not shown),which may be similar to the alias table 130.

In a second example illustrated in in FIG. 8B, the parsed data mayinclude a record 132 having a collection of email addresses. The emailmapping module 34 may be configured to normalize the email addressesbased on the communicating parties related to these email addresses. Theemail mapping module 34 may associate multiple email addresses with acommunicating party, such as an individual, and may represent them as asingle ‘Email party’ in an alias table 134. Similarly, the email mappingmodule 34 may removably associate multiple email addresses in a record136 with a group of individuals corresponding to the same email domain,organization, department, or entity (FIG. 8C) and may represent them asa single ‘Email party’ in an alias table 138. In some embodiments, such‘email party’ may refer to a non-custodian party created to organizeand/or assign one or more email addresses to a named entity forgraphical representation and reporting. Other criteria for normalizingthe data may be contemplated by those having skill in the art.

The email mapping module 34 may store the created alias tables such asthe alias tables 130, 134, 138 and the domains table in the database 28.Such alias tables 130, 134, 138 may be used for the purpose ofgenerating various reports and graphical representations, discussedbelow in greater detail with the descriptions of the AV module 42.

Advanced Filtering Module

The advanced filtering module 36 may be configured to parse the datacorresponding to the files received from one or more modules, such asthe platform connection module 32, into logical segments and performpredetermined analyses on the parsed data. The advanced filtering module36 may include the filter module 40, the AV module 42, and the decisionmodule 44.

Filter Module

The filter module 36 may perform filtering of the files, which may beregistered in the inventory table created by the platform connectionmodule 32, based on various selected facets of the data and values forthe selected facets. In some embodiments, the filter module 36 may usethe metadata associated with the files to identify those files whereinthe value of the selected facets match the filter criteria. Examples ofthese facets may include, but are not limited to, custodians, dates,email domains, file-types, terms or keywords, or current states of theelectronic files, or any combination thereof. The filter module 36 maybe further configured to apply one or more selected facets as acriterion for filtering the data. The filtered data and the associatedfiles may be sent to the AV module 42 or the decision module 44 asselected by a user for analyses.

Analysis and Visualizations Module (AV Module)

The AV module 42 may be configured to analyze the filtered data and thecorresponding files received from the filter module 36 and represent theanalyzed data in interactive formats, which may be viewed on, exported,mapped, or downloaded to various computing devices known in the art,related art, or developed later. The AV module 42 may include the filetype analysis module 46 (FTA module 46), the email communicationanalysis module 48 (ECA module 48), the domain analysis module 50, thedata anomaly analysis module 52, and the deduplication module 54.

File Type Analysis Module (FTA Module)

The FTA module 46 may represent the electronic files collectively asinteractive graphs (not shown) based on file type. Examples of thegraphs may include, but are not limited to, pie charts, bar graphs, linegraphs, pictographs, and cosmographs. In one example, such graphs mayillustrate count, file size, or any other aspect of the electronic filesfor all custodians or for one or more selected custodians. In anotherexample, the FTA module 46 may be configured to create multiple sets ofgraphs based on parent-level file type groups, email items andattachments, all levels of file type groups, and so on. Alternatively oradditionally, the file types and associated information (e.g., filename, file ID, custodian, etc.) may be represented in a grid fordisplay. In some embodiments, the associated information such as filenames may be hyperlinked to provide access to the files stored in thedatabase 28. Other embodiments of the graphs may include the files beingembedded to the corresponding file types, which may be represented inthe grid for display.

Email Communication Analysis (ECA) Module

In one embodiment, the ECA module 48 may parse the received filteredfiles to identify email files among them. For each of the identifiedemail files, the ECA module 48 may access the associated records andrelated alias tables (including the domains tables and other similartables), which may be created by the email mapping module 34 and arestored in the database 28.

In an illustrated example shown in FIG. 9, the email files in thefiltered data may refer to email records, namely ‘Record 1’, ‘Record 2’,and ‘Record 3’ and the corresponding alias table 130 (discussed in thedescription of FIG. 8A) stored in the database 28. The ECA module 48 maybe configured to use the email records and the alias table 130 to createan email communication table 140. The exemplary table 140 may includemultiple columns, namely, “Email_Comm_ID”, “From_Alias_ID”,“To_Alias_ID”, and “Record_ID”. The “Email_Comm_ID” may refer to adistinct ID of a record made in the email communication table 140. The“From_Alias_ID” may refer to the “Alias_ID” of a sender's email addressin the alias table 130. The “To_Alias_ID” may refer to the “Alias_ID” ofa recipient's email address in the alias table 130. The “Record_ID” mayrefer to the email record in the database 28 for which a correspondingrecord is made in the email communication table 140.

The email communication table 140 may store multiple distinct records,each corresponding to a sender-recipient pair using the assigned“Alias_ID” number for the email addresses in the alias table 130. Forexample, a record 142 in the email communication table 140 may have the““Email_Comm_ID” as “2” referring to a distinct record ID in the table140, the “From_Alias_ID” as “1” referring to the email address“anthony.j.thomas@gabco.ne” in the alias table 130, the “To_Alias_ID” as“3” referring to the email address “seth.andrews@gabco.ne” in the aliastable 130, and the “Record_ID” as “1” referring to the email record“Record 1” in the database 28. Similarly, other entries may be createdin the email communication table 140.

In one embodiment, the ECA module 48 may be configured to use the emailcommunication table 140 for displaying email communications betweenindividuals or various other entities such as a group of individuals,organizations, etc. graphically. In an illustrated example shown in FIG.10A, an email communication diagram 150 may represent communicationbetween one or more email parties by way of nodes and lines. Each node,for e.g. a node 152, may represent an email party referring to a logicalgrouping of email addresses based on one or more predeterminedcharacteristics such as an individual (e.g., indicated by metadata, anemail name, etc.), a workgroup (e.g., indicated by metadata, a commonkeyword in the email name, etc.), or a domain, organization, departmentor entity (e.g., indicated by metadata, a domain name, etc.). Eachlogical grouping of email addresses may include at least one emailaddress.

The ECA module 48 may be configured to provide different variations ofthe email communication diagrams based on characteristics of the emailparty selected by a user. Examples of these characteristics may include,but are not limited to, top communicators, custom communicators, andsingle party. In some embodiments, a selection of the ‘topcommunicators’ characteristic may result to display the custodian and/oremail party that has the highest communication volume relative toothers; the ‘custom communicators’ characteristic may result to displaycommunication channels and other details (e.g., email count) betweenselected custodians and/or email parties; and the ‘single party’characteristic may result to display the communication channels and theother details of only one custodian or email party. In some otherembodiments, the ECA module 48 may display an email communicationdiagram for all email addresses in the data received based on facetsselected (or applied) by the filter module 36.

The email communication diagram 150 may represent communication (such asemails, SMS messages, etc.) of a top communicator with other emailparties. For example, the node 152 may represent an email party such as“Bob Barker”, who may be a top communicator having the largest number ofoccurrences in the email records. Nodes such as the node 152 may referto the logical grouping having only one email address. However, thenode, and hence the email party, may be customized to include multipleemail addresses from the given collection of records stored in thedatabase 28 for a particular case or custodian. The node 152 maycommunicate with different nodes such as nodes 154-1, 154-2, 154-3,154-4, and 154-5 (collectively, nodes 154). Each of the nodes 154 may beconnected to the node 152 by one or more lines. For example, the node154-1 may be connected to the node 152 by two lines such as arrows 156-1and 156-2 (collectively, the arrows 156). Each line may represent acollective number of communications and direction of communicationsbetween at least two nodes or email parties. The line such as the lines156 may refer to counts and/or links to the email records represented bythem. In one embodiment, the lines 156 may be curved and indicate thedirection of communication by way of the pointing direction of the arrowheads.

In another example (FIG. 10B), an email communication diagram 160 mayrepresent a customized communication among multiple email parties, eachrepresented as nodes and connected by lines. In another embodiment, thelines may be straight indicating the direction of communication by apointing direction of the arrow heads. Nodes adjacent to the tail ofeach line may represent a sender and the nodes adjacent to each arrowhead may represent a recipient. In a further embodiment (FIG. 10C), thedirection of communication may be indicated through narrowing of thelines. The nodes adjacent to a broad side of the line may represent asource address (or a sender) and the node adjacent to a narrow side ofthe line may represent a destination address (or a recipient). In someembodiments, an email communication diagram may include lines havingdynamic thickness, which may be directly related to the count, volume,or any other characteristic of the records being represented, by theselines.

Some other embodiments may include each line having a predeterminedcolor density corresponding to the number of communications between theaddresses (such as email addresses) associated with the nodes of eachline. The count/volume of communications, such as email, SMS messages,etc., for each sender-recipient pair in each direction may be indicatednear the line. For example, the line 156-1 in the diagram 150 shows that12,857 emails were sent from “Bob Barker” (node 152) to “AlfredHitchcock” (node 154-1). Similarly, various nodes and lines in an emailcommunication diagram may represent a diversity of informationincluding, but not limited to, properties or statistics such as emailscommunicated with attachments, total size of the communicated emails,and emails communicated during a particular time period. Such graphicalrepresentation of communications between various nodes may assist toidentify witnesses or depositories of data that may be considered forthe e-discovery investigations.

In one embodiment, the ECA module 48 may be configured to use thegenerated email communication diagrams, such as the diagram 150, fordetermining unknown witnesses or key witnesses for e-discoveryinvestigations. In one example, the ECA module 48 may compare the emailparties represented as nodes in the email communication diagrams withthe custodians associated with a case. Upon comparison, the ECA module48 may be configured identify one or more email parties as unknownwitnesses when the one or more email parties are not same as thecustodians associated with the case. In another example, the ECA module48 may be configured to identify a key witness in an ESI investigationwhen an email party (represented as nodes) may be (1) same as one of thecustodians associated with the case, and (2) has the largest count ofcommunications (e.g., emails sent and emails received) relative to thecount of communications of other email parties.

Domain Analysis Module

The domain analysis module 52 may be configured to categorize andgraphically represent the filtered files based on email domains. Forexample, the filtered files may be represented domain-wise in aninteractive tabular format under various column headers such as domainname, sender count, recipient count, etc. Each of the representeddomains may be hyperlinked or referenced to the group of filesassociated with that domain. Such group of files, in one embodiment, maybe provided to a user for access by being represented using associatedmetadata such as file ID, custodian, file name, etc. in a grid orvarious other representations known in the art, related art, ordeveloped later for display. In some embodiments, the files may beembedded with the associated metadata represented in the grid fordisplay.

Data Anomaly Analysis Module

The data anomaly analysis module 50 may be configured to represent avolume of data in multiple time segments over time for a given custodianor a group of custodians. Such timeline diagram may enable a user toidentify potential points in time where the data may be missing.Different categories of data may be represented over time by the dataanomaly analysis module 50. Examples of these categories may include,but not limited to, email volume, electronic documents (Edocs) volume,emails sent, and emails received.

The ‘email volume’ may refer to a count of all email-type records for agiven custodian, where each record may be categorized by date. Acorresponding email volume report may or may not be based on emailmappings performed by the email mapping module 34. The ‘Edocs volume’may refer to a count of Edocs-type records for a given custodian, whereeach record may be categorized by date. The ‘emails sent’ may refer to acount of email records for a custodian based on the email addresses thathave been mapped to that custodian, where the email addresses may becategorized by date. The data anomaly analysis module 50 may considerevery instance of email records where one of those “mapped” emailaddresses may be found in the “FROM” field of an email file to determinea count of emails sent. The ‘emails received’ may refer to a count ofemail records for a custodian based on the email addresses that havebeen mapped or related to that custodian, where the email addresses maybe categorized by date. Every email record where one of those “mapped”addresses are found in at least one of the email recipient fields,namely, “TO”, “CC”, and “BCC” may be considered by the data anomalyanalysis module 50 to determine a count of emails received.

In one embodiment, the data anomaly analysis module 50 may use an emailcommunication table, such as the table 140, for plotting the totalnumber of emails communicated by each email party over time in a graph.In an illustrated example shown in FIG. 11, the timeline diagram 180 isa graph showing the total number of emails on y-axis and time (in years)on x-axis. In some embodiments, the timeline diagram 180 may include they-axis referring to units of digital information known in the art,related art, or developed later including kilobytes (KB), megabytes(MB), and gigabytes (GB); and x-axis referring to time in months, days,hours, or any other known or later developed unit capable of being usedto represent time. In some other embodiments, the y-axis may representthe total number of electronic files, emails sent, emails received, orany other aspect of data known in the art, related art, or developedlater.

A curve may be plotted on a timeline diagram for every record in theemail communication table, such as the email communication table 140.Each record may be associated with at least one date (e.g., sent andreceived dates for email; created and modified dates for non-emailfiles, etc.) which may be assigned as year, month, day, or any otherunit of time. In some embodiments, an email-sent-date may be prioritizedover an email-received-date and a file-modified-date may be prioritizedover a file-creation-date for plotting a curve on the timeline diagram.Each plotted curve may refer to a single custodian, a set of custodiansgrouped together, or any other entity such as an email party or group ofemail parties.

In one example, a point P on the curve 182 may represent a volume ofemail files as “20,000” in the year “1997” for the email party “StevenKean”. In another example, the timeline diagram 180 may include thecurve 182 drawn for a group of custodians and the related aggregateddata being represented along the y-axis as either an average or acumulative total for the group of custodians. The data anomaly analysismodule 50 may be further configured to provide interactive timelinediagrams. In one example, an interactive timeline diagram may bezoom-able to interactively expand or contract the timeline, for e.g.,illustrated on the x-axis, into different time segments along thex-axis. Other examples may include the timeline diagram being enabled toallow interactive selection of points or ranges on the timeline diagramto refine or sub-select a set of data being represented for display.

The data anomaly analysis module 50 may be configured to use thegenerated timeline diagrams for assessing time-based anomalies in datafor the e-discovery investigations. For this, the data anomaly analysismodule 50 may identify various aspects (e.g., file types, hash values,file systems, etc.) associated with the filtered files and related datacorresponding to one or more specified custodians. A date associatedwith each of the identified electronic files may be determined. The dataanomaly analysis module 50 may also determine a number of electronicfiles associated with the specified custodians in each of a series ofpredefined or dynamically defined time segments (e.g., intervals of oneyear) over a period of time (e.g., a period of ten years) to display thecorresponding timeline diagram. The data anomaly analysis module 50 maybe configured to compare the number of electronic files (e.g., emailfiles) between different time segments to identify those time segmentswith large and/or small numbers of electronic files as compared to acount of electronic files in other time segments for assessing thetime-based anomalies in the data. For example, as shown in FIG. 11, acurve 184 may represent a volume of email files over seven years from“1996” to “2002”. The data anomaly analysis module 50 may determine thevolume of email files at regular time segments or intervals of one yearover time period of seven years. The data anomaly analysis module 50 maybe configured to display the curve 184 and related time segments ‘1996to 1997’ and ‘1997 to 1998’, where the numbers or volume of emails filesare relatively less than the same in other time segments for thecustodian “Jeff Skilling”. The data anomaly analysis module 50 may befurther configured to determine whether the actual number of files andrelated data (or critical data) are missing or have not be consideredfor ESI investigation based on a predetermined threshold value. The dataanomaly analysis module 50 may generate an indication (e.g., a pop-upalert message, a beep, mouse vibration, etc.) to a user about thecritical data being missed in one or more time segments such as in timesegments ‘1996 to 1997’ and ‘1997 to 1998’, when the data in these timesegments may be less than 10% of the total data retrieved by theplatform connection module 32 from the third-party data repository 20.In some embodiments, the threshold value may be defined on-the-fly basedon a variety of parameters known in the art, related art, or developedlater, by the user.

Additionally or alternatively, the data anomaly analysis module 50 mayidentify deleted or missed electronic files and/or electronic files fromat least one of the third-party data repository 20 and the database 28for analyses, based on one or more predefined or dynamically definedthreshold values. In some other embodiments, the data anomaly analysismodule 50 may define the number of electronic files in each of theseries of time segments for a group of custodians collectively. Infurther embodiments, the electronic files may represent e-mail files orelectronic documents. Other embodiments may include the electronic fileshaving e-mail files corresponding to e-mails sent to the custodians orthose corresponding to e-mails sent from the custodians.

Deduplication Module

The DeDuplication module 54 may be configured to represent the files andrelated data, which are filtered by the filter module 40, in one or morepredetermined schemes known in the art, related art, or developed later.Each scheme may provide counts of duplicate and non-duplicate datarecords based on the metadata associated with the files.

In one example, the DeDuplication module 54 may represent the files andrelated data in a Global DeDuplication scheme. According to this scheme,the DeDuplication module 54 may generate one or more reports indicatinga number of duplicate and non-duplicate records for an entire case or acustomizable group of cases. The data represented in the GlobalDeDuplication scheme may indicate a set number of records that may beexported to an e-discovery reviewing application.

In another example, the DeDuplication module 54 may represent the filesand related data in a Custodian DeDuplication scheme. According to thisscheme, the DeDuplication module 54 may generate one or more reportsindicating a number of duplicate and non-duplicate records for eachcustodian or a customizable group of custodians. The data represented inthe Custodian DeDuplication scheme may indicate that at least one copyof each duplicate record may be exported for each custodian, or thecustomizable group of custodians, to an e-discovery application, such asan e-discovery reviewing application.

Decision Module

The decision module 44 may be configured to apply another filter facetreferred to as file scope, which may indicate whether or not the filesand related data received from the AV module 42 or the filter module 40are relevant to the investigation. In one embodiment, the file scope maybe represented as various labels, which may be preset based on themetadata associated with the files, or selected explicitly fromdynamically defined labels by a user based on a manual review of thefiles and related data. In one example, the labels may be named as“Include”, “Exclude”, or “Undecided” to indicate the status of files forbeing promoted to one of the stages such as a review stage of thee-discovery investigations. The label “Include” may indicate that thecorresponding files are ready to be forwarded to a reviewingapplication, such as the third-party application 14. The label “Exclude”may refer to files and related data intended not to be promoted forwardto an e-discovery reviewing application. The files and related datamarked with the “Exclude” label may be considered as irrelevant for theinvestigation. The label “Undecided” may refer to the default state offiles and related data received by the advanced filtering module 36. Thefiles and related data marked with the “Undecided” label may indicate,without limitation, whether these files and related data are yet to bereviewed or need further review until a decision is made to “Include”and “Exclude” them.

In some embodiments, the decision module 44 may provide a “Committed”label in addition to rest of the labels. Once the “Committed” label isselected and/or applied, the status of the files and related data markedwith any of the rest of the labels may become unchangeable. For example,when the status of files marked as “Included” is changed to “Committed”,the initially selected or marked label “Included” referring to thestatus of the files may bound them to be promoted to the reviewapplication. Similarly, the application of the “Committed” label mayirrevocably seal the status of the files being marked as “Excluded” and“Undecided”, which may however be subjected to offline analysis orforwarded to the e-discovery review application with the same initialstatus based on one or more user inputs.

FIG. 12 illustrates an exemplary method for implementing the interactivecase management system, according to an embodiment of the presentdisclosure. The exemplary method 190 may be described in the generalcontext of computer executable instructions. Generally, computerexecutable instructions may include routines, programs, objects,components, data structures, procedures, modules, functions, and thelike that perform particular functions or implement particular abstractdata types. The computer executable instructions may be stored on acomputer readable medium, and installed or embedded in an appropriatedevice for execution.

The order in which the method 190 is described is not intended to beconstrued as a limitation, and any number of the described method blocksmay be combined or otherwise performed in any order to implement themethod 190, or an alternate method. Additionally, individual blocks maybe deleted from the method 190 without departing from the spirit andscope of the present disclosure described herein. Furthermore, themethod 190 may be implemented in any suitable hardware, software,firmware, or combination thereof, that exists in the related art or thatis later developed.

The method 190 describes, without limitation, implementation of theexemplary interactive case management system 16. One of skill in the artwill understand that the method 190 may be modified appropriately forimplementation in a various manners without departing from the scope andspirit of the disclosure.

At step 192, case management system 16 retrieves electronic files andrelated data from data source 12. In one embodiment, a user may logininto the interactive case management system 16 using predefined logincredentials, for e.g., a username and password, or any other accesstechniques such as those discussed above. The interactive casemanagement system 16 may be configured to manage one or more cases,custodians, and clients, as well as compatible or associated file sharesand databases. In one example, a case and associated one or morecustodian may be created in the interactive case management system 16.The user may be assigned access rights to perform a variety ofoperations including, but not limited to, (1) case and custodianinformation management; (2) assignment and/or publishing of data for thecreated case from an outside data source such as the data source 12 tothe interactive case management system 16; (3) configuring the publisheddata for analyses, reporting, display, and export to one or morecompatible systems; (4) user and user role (e.g., case roles, systemroles, etc.) information management; and (5) billing management based onvarious schemes such per case, per custodian, per user access, per role,and so on.

Once the case is created, the user may establish a communication linkwith the data source 12 and the third party e-discovery application 14through the data intake module 30. The link may be created using variouswired or wireless interfaces and access techniques known in the art,related art, or developed later. For e.g., the data intake module 30 maycommunicatively connect with the data source 12 and the third-partyapplication 14 via one or more USB cables and login credentials. Thedata source 12 may store a collection of loose, unprocessed, electronic,computer readable files such as system files, program files, documentfiles, multimedia files, and emails, which may be accessed for the caseby the data intake module 30.

At step 194, at least one of the accessed files and related data areculled based on predetermined filter criteria. The data intake module 30may include various predefined or dynamically defined criteria forfiltering the accessed files and related data. Examples of thesecriteria may include, but are not limited to, one or more file paths,date ranges, file types, and hash values. The data intake module 30 maydetermine the file information (e.g., filename, file path, system date,etc.) of the accessed files, hash values (e.g., MD5 hash values),reference hash tables such as the reference hash tables 102, and filetypes (e.g., PDF, PST email database, MOV video, WAV audio, TIFF image,etc.) to implement the filter criteria. The user may select one or morefilter criteria to cull at least one of the files and related data,which may not be relevant for the e-discovery request or investigation.The remaining files and related data may be returned as a positivefilter result for each of the selected filter criteria.

At step 196, the remaining files and related data are stored in athird-party data repository. The data intake module 30 may access thethird-party application 14 and store the files and related data obtainedas the positive filter result in the data repository 20. On the otherhand, the data intake module 30 may tag the culled files and relateddata with a label stating “Filtered”. The culled files and related datamay not be moved or mapped to the predetermined location such as thedata repository 20 from the data source 12. In one embodiment, the dataintake module 30 may record the filtering information for the positivefilter result, or otherwise, in the intake table. The filteringinformation may include, but not limited to, a list of values inputtedfor the selected filter criteria, actual files and related data, and soon.

At step 198, a set of files and related data stored in the third-partydata repository 20 may be mapped into a predetermined database schema.The third-party data repository 20 may store the files and related datain various known or proprietary formats and schema. In one embodiment,the interactive case management system 16, upon a user request, may mapa set of files and related data from the data repository 20 to thedatabase 28.

The user may login into the third-party application 14 through theplatform connection module 32 using various access techniques known inthe art, related art, or developed later. The platform connection module32 may determine the file system type and database information includingthe database table having the metadata of records and the file sharelocation in the data repository 20. The database table of the datarepository 20 may be used to determine the location of files and relateddata.

In one embodiment, the third-party application 14 may authorize theplatform connection module 32 to access those files and related datathat may be uncorrupted and/or available for being published by theinteractive case management system 16. The accessed files and relateddata may be mapped to a predetermined schema so that the files andrelated data are usable by the interactive case management system 16.

The platform connection module 32 may parse the accessed data (e.g.,body of an email, content of a word file, a file embedded in anotherfile, etc.) and related metadata and map the parsed data into varioustables in the predetermined schema implemented by the database 28.

For example, the data repository 20 may store the data in a table havingfields “Author_Name” and “Email_Subject”. However, a table of thepredetermined schema may have fields “Author” and “Subject”. In oneembodiment, the platform connection module 32 may be configured to readthe files and map the related data from the “Author_Name” and the“Email_Subject” fields to the respective “Author” and the “Subject”fields of the predefined schema of database 28.

In some embodiments, the platform connection module 32 may reverse map apredetermined information associated with the mapped files and relateddata from the predefined schema of database 28 to the data repository20. Such reverse-mapped information (e.g., custodian ID, media ID, datasource ID, work package ID, etc.) may be tagged to facilitate trackingof the tagged files and data, which is mapped in the predefined schemaof database 28, in the data repository 20. The mapped files and relateddata may be stored in the database 28 so as to be used for analyses,reporting, display, and export to one or more compatible systems orapplications.

Exemplary steps for storing the mapped data in the database 28 areillustrated in FIG. 13. At step 212, data for being stored in thepredetermined database schema are received. The platform connectionmodule 32 may create a metadata table, such as metadata table 230 (FIG.14), and an extracted text table, such as extracted text table 232, forthe data mapped in the predetermined schema. The metadata table 230 maystore metadata (e.g., filename, data size, hash value, email subject,etc.) associated with the mapped data and the extracted text table 232may store the extracted text such as body of an email, contents of aword file, etc. from the files corresponding to the mapped data.

The mapped data may be associated with one or more hash values of apredetermined hashing algorithm. For example, the mapped data may beassociated with one or more MD5 hash values. The platform connectionmodule 32 may be configured to determine at least one hash valueassociated with the mapped data. In some embodiments, these hash valuesmay be calculated by the platform connection module 32 or the dataintake module 30 for the mapped files. In other embodiments, the hashvalues corresponding to the mapped data may be determined from theintake table created by the data intake module 30 at the time ofingesting the original data and the corresponding files from the datasource 12 to the data repository 20.

At step 214, the at least one hash value is compared with the hashvalues in the metadata table 230. In one embodiment, the platformconnection module 32 may compare at least one MD5 hash value related tothe mapped data with the hash values in the metadata table 230. At step216, the platform connection module 32 may check if the at least onehash value such as MD5 hash value exists in the metadata table 230.

At step 218, if the platform connection module 32 determines that the atleast one hash value already exists in the metadata table 230, theplatform connection module 32 may be configured to determine themetadata (MD) identity number “MD_ID” corresponding to the at least onehash value existing in the metadata table 230.

At step 220, if the platform connection module 32 determines that the atleast one hash value does not exist in the metadata table 230, theplatform connection module 32 may be configured to insert a new recordfor the metadata associated with the mapped data and assign a new MD IDto this new record in the metadata table 230.

At step 222, the platform connection module 32 may be configured to mapthe MD_ID from steps 218 and 220, and the corresponding metadata fromthe metadata table 230 to an inventory table 234. In some embodiments,the platform connection module 32 may refer to the extracted text in theextracted text table 232 using the MD_ID and may additionally copy theextracted text to the inventory table 234. In FIG. 14, two recordshaving inventory ID “1” and “3” in the inventory table 234 have the samemetadata IDs, i.e., “1” indicating that both the records refer to thesame metadata in the metadata table 230 and the extracted text table232. Therefore, a single copy of the metadata and the extracted text maybe maintained and stored in the database 28. Such single instancestorage of the metadata and the extracted text improves data accessefficiencies by saving storage space as well as reduces data insert timein the database tables, particularly for those which hold larger amountsof data on average per record. Various modules of the interactive casemanagement table 16 may use the inventory table 234 for data analyses,reporting, display, or export to the other systems or applications.

In some embodiments, publishing of accessed files and related dataimplemented by mapping of the parsed data into one or more tables in thepredetermined schema of the database 28, may be unpublished by theplatform connection module 32. Upon being unpublished, the mapped filesand related data may be removed from at least one of the database 28 andthe data repository 20. In some other embodiments, the platformconnection module 32 may be configured not to unpublish the mappedand/or stored files and related data if any of the files or related datais already associated with at least one the status decisions including“Included”, “Excluded”, “Committed”, or the like, by a user.

In further embodiments, the platform connection module 32 may beconfigured to index the mapped files and related data using a variety oftypes of indexes known in the art, related art, or developed later. Inone embodiment, the platform connection module 32 may create thefull-text index to support full-text searching of various data recordsstored in the database 28.

The processor 22 module may receive from a user one or more search termsor metadata element for being searched in the database tables. Thesearch terms may be used for searching the data records using varioustools, techniques, and syntaxes known in the art, related art, ordeveloped later including “dtSearch” searching technology to implementfull-text searching. The search terms may be stored as metadata for thecorresponding case in the metadata table such as the metadata table 230in the database 28.

Based on the search, the processor 22 may generate search reportsincluding various predetermined fields and columns. In one embodiment,the generated search reports may include at least one column thatprovides a measure of impact for each search term. For example, thesearch report may provide the total number of files (and correspondingsize such as in gigabytes), which are exclusive hits by each of thesearch terms as compared to other terms in a search query.

At step 200 (FIG. 12), the mapped files and related data are analyzed.The mapped files and related data may be analyzed by the advancedfiltering module 36 based on a user input. One exemplary methodimplemented by the advanced filtering module 36 is discussed in FIG. 15.

As shown, at step 242, at least one predefined filter facet and a valueor a range of values for that facet may be selected from a plurality ofpredefined filter facets based on a user input. The filter module 40 mayinclude a variety of filter facets predefined based on the metadatacorresponding to the mapped files. Examples of these facets may include,but are not limited to, custodians, dates, email domains, file-types,and keywords. In some embodiments, the filter module 40 may allow a userto on-the-fly, i.e., dynamically, define these filter facets. Among thedefined filter facets, the user may select at least one filter facet forfiltering the mapped files and related data.

At step 244, an SQL query may be created for the selected at least onefilter facet. The filter module 40 may employ various tools, techniques,and protocols in any computer language to communicate with the database28. In one embodiment, the filter module 40 may convert a filter facetselection by a user into an SQL query for communicating with thedatabase 28.

At step 246, at least one file and related data may be retrieved basedon the created SQL query. The filter module 40 may apply the created SQLquery to search for the corresponding data records in various tables ofthe database 28. In one embodiment, the complete family of records maybe included within the scope of SQL query for searching the relevantdata records in the database 28. The family of records may refer tomultiple files associated with each other in an attachment hierarchy.For example, the SQL query may correspond to a ‘file type’ filter facetsuch as email items. The query may return a filter result including atleast one email file, which may be associated with another file such asa word file or a GIF file being an attachment of the email file. The atleast one email file and its attachments may be retrieved by the filtermodule 40.

At step 248, a temporary table including the retrieved at least one fileand related data is created. In one embodiment, the filter module 40 maycreate a temporary table, for storing the filter result. The filterresult may include the retrieved files and related data, and thecorresponding selected filter facet. The temporary table may be storedin the database 28 by the filter module 40.

At step 250, one or more statistical reports may be generated using thetemporary table based on at least one predetermined parameter. In oneembodiment, various modules, such as the ECA module 48, in the advancedfiltering module 36 may use the temporary table for generatingstatistical reports. The reports may be generated based on variouspredetermined parameters based on the metadata associated with theretrieved files in the temporary table. Examples of these parameters mayinclude, but are not limited to, document type (e.g., indicating countof email files versus other electronic files), direct search hits basedon the filter facets, indirect search based on the filter facets, and soon. Direct search hits may refer to the documents that meet the exactsearch criteria specified by the selected filter facet. The indirectsearch hits may refer to additional family documents (e.g., emailattachments, etc.) associated with a document that meets the exactsearch criteria specified by the selected filter facet.

At step 252, the generated one or more statistical reports are displayedgraphically to a user. Various modules associated with the advancedfiltering module 36 may display the generated statistical reportsgraphically on a display device. The example of FIG. 16 shows anexemplary interface screen 260 of the interactive case management system16. The interface screen 260 may include a filter facet section 262, afilter summary section 264, a views section 266, a results section 268,and a content section 270, each of which may be resizable, collapsible,or capable of being dragged over each other.

The filter facet section 262 may display various predefined ordynamically defined filter facets for being selected by the user. Thefilter summary section 264 may display current statistics for votingdecisions on the filtered files and related data. The views section 266may show statistics and visualizations about the files and related dataobtained as filter results upon application of the selected filterfacets. The results section 268 may display the metadata for the filesobtained as filter results upon application of the selected filterfacets. The content section 270 may display the extracted text from afile corresponding to metadata selected in the results section. In someembodiments, the content section 270 may also provide an option todownload the original file stored in the database 28.

In a first embodiment (FIG. 16), the filter module 40 may display thestatistical reports graphically in the views section 266. In oneexample, the filter module 40 may display the statistical reports in atable 272 based on document type. As shown, the table 272 may includecolumns “Count” and “Size (GB)” for each of the file types selected byway of the filter facets in the filter facet section 262. The “Count”may refer to the total number of files of a particular file type, and“Size (GB)” may refer to the size of the total number of files of thatfile type. In another example, the filter module 40 may display thestatistical reports graphically in a table 274 base on direct andindirect search hits. As shown, the table 274 may include columns“Count” and “Size (GB)” for each of the direct search hits represents as“Direct” and indirect search hits represented as “Indirect”. In someembodiments, the graphically displayed statistical reports may beassociated with one or more predefined or dynamically defined widgets.

The results section 268 may display the metadata for the direct searchhits. The metadata may include multiple files and related data includingfile ID represented under the column “ID”, associated custodianrepresented under the column “File Name”, file type represented underthe column “File Type”, and current decision status represented underthe column “Decision” indicating whether the corresponding file andrelated data are relevant for the e-discovery investigation and may besubmitted to an e-discovery application such as the third-partyapplication 14 or any other e-discovery application. Each of such columnheaders (e.g., “ID”, “File Name”, “File Type”, “Decision”, etc.) may beconfigured to sort the respective underlying data in numeric oralphanumeric order. For example, the column header “ID” may be clickedto sort the underlying IDs in ascending or descending order.Additionally or alternatively, one or more columns may be temporarilyadded or removed. In some embodiments, the results section 268 may beconfigured to receive one or more inputs such as text for filtering themetadata displayed in one or more columns.

Upon selecting a metadata record in the results section 268, thecorresponding data or extracted text may be displayed in the contentsection 270. For example, when a metadata record 276 is selected, theextracted text or content from a corresponding file may be displayed inthe content section 270.

The filter summary section 264 may display the current statistics of thefiles and related data displayed in the results section 268. Forexample, the filter summary section 264 may display the count of filesand related data as a pie chart 278 and the corresponding total size ofthe files and related data may be displayed as a pie chart 280. The piecharts 278, 280 may be color coded to represent the corresponding countand size of files based on the status decision such as “Include”,“Exclude”, or “Undecided”, of the files. In one instance, the fileshaving the status decision as “Include” may be represented by greencolor, the files having the status decision as “Exclude” may berepresented by red color, and the files having the status decision as“Undecided” may be represented by grey color

In a second embodiment (FIG. 17), the ECA module 48 may displaycommunications (e.g., email communications, SMS messages, etc.) betweentwo or more communicating parties graphically in the views section 266of the interface screen 260. In one example, the ECA module 48 maydisplay an email communication diagram 282 having each email party beingrepresented as a node; and two or more nodes being connected using linesrepresentative of the email communication between each pair of thenodes. Each line may refer to one or more files and related datacommunicated between the corresponding nodes. These files and relatedmetadata corresponding to the line, upon being selected, may bedisplayed in the results section 268. For example, the user may click onthe line 284 to display the corresponding files and related metadata inthe results section 268. In some embodiments, the email communicationdiagrams may be associated with one or more predefined or dynamicallydefined widgets.

Similar to the first embodiment, a metadata record in the resultssection 268 may be selected to display the corresponding data orextracted text in the content section 270. For example, the metadatarecord 276 may be selected to display the corresponding extracted textor content in the content section 270. Further, the filter summarysection 264 may display the count of files and related corresponding tothe selected line 284 as the color-coded pie chart 278 and thecorresponding total size of the files and related data may be displayedas the color-coded pie chart 280. Other embodiments may include displayof timeline diagrams, such as the timeline diagram 180, for assessingtime-based anomalies in data by determining deleted or missed files andrelated data based on one or more predefined or dynamically definedthreshold values.

Further to the method 190 implemented by the interactive case managementsystem 16, at step 202 (FIG. 12), a status decision may be applied onthe analyzed files and related data. The analyzed files may be subjectedto the decision module 44 configured to apply a status decisionindicating if the files are relevant for the e-discovery investigations.In one embodiment, the decision module 44 may allow a user to select atleast one of the labels, namely, “Include”, “Exclude”, “Undecided”, and“Committed” to indicate a file status decision.

“Include” label may indicate that the corresponding files are relevantfor e-discovery investigations and may be forwarded to the e-discoveryreviewing application. The label “Exclude” may refer that the files andrelated data are not relevant for the e-discovery investigations and maynot be forwarded to the e-discovery reviewing application. The label“Undecided” may refer to the default state of files and related datareceived by the decision module 44. The “Undecided” label may indicate,without limitation, that the corresponding files and related data areeither yet to be reviewed or need further review until a decision ismade to “Include” and “Exclude” them.

The “Committed” label may indicate that the status of the correspondingfiles and related data have been finalized. In one example, the statusof a file marked with the “Include” label may be considered as final ifthe label is changed to “Committed” by the user. Hence, the status ofthe file cannot be changed any further after the “Committed” label isselected by the user.

At step 204, at least one analyzed file and related data is submitted toan e-discovery application based on the applied status decision. Theadvanced filtering module 36 may be configured to submit or hold backthe analyzed files and related data to the e-discovery application suchas the third-party application 14 or any other application based on thefile status decision selected by the user. For example, if the userselects the status of an analyzed file as “Include”, and then“Committed”, the analyzed file may be considered as being relevant forthe e-discovery investigations and hence, may be forwarded or submittedto the e-discovery application, such as an e-discovery reviewingapplication. In some embodiments, the analyzed files may be subjected tode-duplication by the DeDuplication module 54 after being applied withthe “Committed” label for further reducing the volume of data to bepromoted to the e-discovery reviewing application. The relevant volumeof data corresponding to the analyzed files may be displayed by theDeDuplication module 54 as a high-level snapshot by running a filter forall files and related data labeled as “Included” only.

In case, the status of the file is selected to be “Exclude”,“Undecided”, or not selected to be “Committed”, the corresponding filemay be held back with the interactive case management system 16 in thedatabase 28 by the advanced filtering module 36. Alternatively, thestatus of the electronic file in data repository 20 may be updated withany change in status by the decision module 44.

Exemplary embodiments are intended to cover all software or computerprograms capable of performing the various heretofore-discloseddeterminations, calculations, etc., for the disclosed purposes. Forexample, exemplary embodiments are intended to cover all software orcomputer programs capable of enabling processors to implement thedisclosed processes. In other words, exemplary embodiments are intendedto cover all systems and processes that configure a computing device toimplement the disclosed processes. Exemplary embodiments are alsointended to cover any and all currently known, related art or laterdeveloped non-transitory recording or storage mediums (such as a CD-ROM,DVD-ROM, hard drive, RAM, ROM, floppy disc, magnetic tape cassette,etc.) that record or store such software or computer programs. Exemplaryembodiments are further intended to cover such software, computerprograms, systems and/or processes provided through any other currentlyknown, related art, or later developed medium (such as transitorymediums, carrier waves, etc.), usable for implementing the exemplaryoperations disclosed above.

In accordance with the exemplary embodiments, the disclosed computerprograms may be executed in many exemplary ways, such as an applicationthat is resident in the memory of a device or as a hosted applicationthat is being executed on a server and communicating with the deviceapplication or browser via a number of standard protocols, such asTCP/IP, HTTP, XML, SOAP, REST, JSON and other sufficient protocols. Thedisclosed computer programs may be written in exemplary programminglanguages that execute from memory on the computing device or from ahosted server, such as BASIC, COBOL, C, C++, Java, Pascal, or scriptinglanguages such as JavaScript, Python, Ruby, PHP, Perl or othersufficient programming languages.

The above description does not provide specific details of manufactureor design of the various components. Those of skill in the art arefamiliar with such details, and unless departures from those techniquesare set out, techniques, known, related art or later developed designsand materials should be employed. Those in the art are capable ofchoosing suitable manufacturing and design details.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the disclosure.It will be appreciated that several of the above-disclosed and otherfeatures and functions, or alternatives thereof, may be combined intoother systems or applications. Various presently unforeseen orunanticipated alternatives, modifications, variations, or improvementstherein may subsequently be made by those skilled in the art withoutdeparting from the scope of the present disclosure as encompassed by thefollowing claims.

What is claimed is:
 1. A method for selecting electronic filesassociated with an investigation in a computer system including at leastone processor, at least one electronic storage device coupled with atleast one processor and at least one display coupled to the at least oneprocessor comprising: the at least one processor receiving at least onefacet for the basis of the selecting and at least one value associatedwith each facet, the at least one facet including custodians of theelectronic files, dates associated with the electronic files, e-maildomains associated with e-mails represented by the electronic files,file types of the electronic files, terms included within the electronicfiles, or current states of the electronic files or any combinationthereof; the at least one processor filtering for computer readableelectronic files stored in the at least one electronic storage devicemeeting the at least one value associated with the at least one facetreceived by the at least one processor; the at least one processordisplaying on the at least one display metadata associated with theelectronic files identified in the filtering; the at least one processorcausing contents of an electronic file selected based on the metadata tobe displayed on the at least one display; and the at least processorrecording an indication from a user as to whether or not an electronicfile is responsive to an investigation.
 2. The method of claim 1 whereinthe investigation includes a discovery request.
 3. The method of claim 1wherein: the at least one processor causes possible facets to bedisplayed on a first portion of the at least one display and enables auser to click on one or more facets to select one or more facets; the atleast one processor causes metadata related to electronic filesidentified by the filtering on a second portion of the at least onedisplay and enables a user to select one or more of the electronicfiles; and the at least one processor causes contents of the selectedelectronic files to be displayed on a third portion of the at least onedisplay.
 4. The method of claim 1 wherein the at least one processorcauses a graphical representation of the electronic files to bedisplayed on the at least one display wherein nodes in the graphicalrepresentation represent communication addresses associated with theelectronic files and lines between nodes represent a number anddirection of communications between the communication addresses.
 5. Themethod of claim 1 wherein the at least one processor causes file typesof the electronic files to be displayed on the at least one display. 6.The method of claim 1 wherein the at least one processor causes arepresentation of a number of communications represented by theelectronic files in a plurality of time segments over a period of timeto be displayed on the at least one display.
 7. The method of claim 1wherein the at least one processor causes domains of the electronicfiles to be displayed on the at least one display.
 8. The method ofclaim 1 wherein the at least one processor causes both the number ofelectronic files responsive to each selected facet and the number ofelectronic files responsive to each selected facet and no other selectedfacet to be displayed on the at least one display.
 9. The method ofclaim 1 wherein the at least one processor performs filtering employingan SQL query.
 10. The method of claim 1 wherein the at least oneprocessor receives a sub-selection for further filtering of identifiedelectronic files.
 11. A computer system for selecting electronic filesassociated with an investigation comprising: at least one processor; atleast one electronic storage device coupled to the at least oneprocessor; and at least one display coupled to the at least oneprocessor, wherein: the at least one processor receives at least onefacet for the basis of the selecting and at least one value associatedwith each facet, the at least one facet including domains of theelectronic files, dates associated with the electronic files, e-maildomains associated with e-mails represented by the electronic files,file types of the electronic files, terms included within the electronicfiles, or current states of the electronic files, or any combinationthereof: the at least one processor filters for computer readableelectronic files stored in the at least one storage device meeting theat least one value associated with the at least one facet received bythe at least one processor; the at least one processor displays on theat least one display metadata associated with the electronic filesidentified in the filtering; the at least one processor causes contentsof an electronic file selected based on the metadata to be displayed onthe at least one display; and the at least one processor records anindication from a user as to whether or not an electronic file isresponsive to an investigation.
 12. The system of claim 11 wherein theinvestigation includes a discovery request.
 13. The system of claim 11wherein: the at least one processor causes possible facets to bedisplayed on a first portion of the at least one display and enables theuser to click on one or more facets to select those facets; the at leastone processor causes metadata concerning electronic files identified bythe filtering on a second portion of the at least one display andenables a user to select one or more of the electronic files; and the atleast one processor causes the contents of the selected electronic filesto be displayed on a third portion of the at least one display.
 14. Thesystem of claim 11 wherein the at least one processor causes a graphicalrepresentation to be displayed on the at least one display wherein nodesin the graphical representation represent communication addressesassociated with electronic files and lines between the nodes represent anumber and direction of communications between the communicationsaddresses.
 15. The system of claim 11 wherein the at least one processorcauses filter types of the electronic files to be displayed on the atleast one display.
 16. The system of claim 11 wherein the at least oneprocessor causes a representation of a number of communicationsrepresented by the electronic files in each of a plurality of timesegments over a period of time to be displayed on the at least onedisplay.
 17. The system of claim 11 wherein the at least one processorcauses domains of the electronic files to be displayed on the at leastone display.
 18. The system of claim 11 wherein the at least oneprocessor causes both the number of electronic files responsive to eachselected facet and the number of electronic files responsive to eachselected facet and no other selected facet to be displayed on the atleast one display.
 19. The system of claim 11 wherein the at least oneprocessor performs filtering employing an SQL query.
 20. The system ofclaim 11 wherein the at least one processor receives a sub-selection forfurther filtering of identified electronic files.